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This paper is directed to those who are undertaking 
evaluation of a bilingual program for the first time or who have 
already struggled with the mysteries of such an undertaking. Emphasis 
is given to the reporting reguirements of the various federal and 
state funding agencies. The bilingual-bicultural program structure is 
defined so the evaluator can see the interplay of program prototypes, 
student language facilty, and instructional approach. The evaluation 
process is divided into an explication of evaluation models, 
evaluation design^ and instrumentation. Examples of each of these 
process components are given, (Author) 
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'I Ins pjipcT IS directed Lo Lhu.se who are und(?rLakiiig evaluation of a bilingual program 
for the first Lime or who have already struggled with the niysterie,s of such an undertak- 
ing. I'.mphasis IS given to the reporting requirements of the various federal and .state 
(undmg agencies. The hilingiial-bicultural program .structure is defined so the evaluator 
can see the uuerplay of program prototypes, st udent language facility, and instructional 
approach. The ^valuation proces.s i.s divided into an eX[)lication of evaluation models. 
evaUiatiryn design, and instrumentation, and f.xamples of each of these process com- 
ponents are gi\'en. 
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Bilingual in.structional programs are not a new phenom- 



enon in the United States. Indeed, such program.s existed 
as early a.s the I89()s. 'I'hose first programs were few. 
aimed at unique and small populations, and fretjUently 
taught in parochial .sch(K)ls. primarily a.s an attempt to 
maintain a particular ethnic identity. It was not until the 
last decade, with its attendant educational reforms and 



funding efforts, that bilingual [)rograms flourished and 
received recognition. With thi.s recognition came the 
demand lor accountability and evaluation. Milingual pro- 
grams a-e sufficiently unique that there are troublesome 
[)rol)lems attendant upon their evaluation. Thi.s paper will 
juitlrcvss the.se problems and offer .some guides to a more 
ertcctive evaluation effort. 



iliSTOUICAL DKVRLOPMKNT 



^nv^'l'he development and lull flowering of bilingTjal educa- 
V fe-'tional programs are relatively recent in American educa- 
tion. The melting pot theory of cultiiraLand linguistic 
assimilation, which dominated education and culture in 
)the United States until the 19G{)s, recjuired instruction and 
reports of its results to be in Knglish. 

'Phe massive educational reform triggered by the Kle- 
mentary and Secondary f-iducation Act IKSHA) of 196.~i 
.wa.s the first .systematic effort to identify and treat the 
educational deficiencies of students with problems 
-stemming from inadequate command of th^' P^nglish 
language. The earliest such efforts were typically labeled 
''ESL," or "English as a Second Language." FSL'.s main 
instructional objecTive was the development of competence 
in both written and spoken English. The assessment of 
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such programs wa.s relatively simple, since the attainment 
ot the objective could be readily determined through a 
variety of existing measures, ranging from .standard 
vocabulary lists (for determining acquisition of sight 
vocabulary) to the whole set of available achievement tests 
(tor determining reading comprehension skills in Knglish). 

These early — and often primitive — components of Title 
I were soon augmented by the more comprehensive thrusts 
of federal funding under the 1967 amendments to P.L. 
'^9.10. which created Title VII. the Bilingual IVogram. 
Under the auspices of this act. the concepts ol instruc- 
tional intervention were enlarged from thr; ESL focus to a 
multieomponent program including st^ff development, 
Cf)mmunity involvement, and development of instructional 
materials. This response to Title VU was the genesis of' 
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the prograninnaii: etlort known Unlay under the ^a»neral 
rubric '1 hjling-ual- hiculLural programs. 

Fi -hman and Lovas (fi) have Identified a eontinuum of 
programs ranging from KSL to full hilingual-bieulturaL 
Martinez and Housden ((>} trare the evolution of the var- 
ious efforts, and call for a clarification of the definition of 
bilinKual-hicultural. In addition, they propose a multi- 
dimensional evaluation framework integrating instruc- 
tional approaches, [)r(>gram types, and student language 
facility. They v,'arn that fully effective evaluation is in 
jeopardy until criteria mutually agreed upon by evaluators 
anil bilingual educators are established. 

As the nature and variety of program types proliferate, 
so do the evaluation prohlenis — especir.lly when inf(;rma- 
tion must he aggregated ai'ross programs, as in federal . 
state, and other large-scale endeavors. l-A'aluation has 
progressed from a relatively simple process of describing a 
program and judging its worth to an evaluative-research 
process whe^e one is askeil not only to make a statement 
about the worthiness of an endeavor but to contrast it.s 
effect with that of other instructional methods or pro- 
grams. The American Institutes for Research (li). through 
a contract with U.S. O.K.. searched for effe'/tive bilingual- 
bicultural programs \'.ir dissemination. Only about five 
percent of the existing programs could present data to 
support judgment as to their impact. While the criteria 
used were rigorous, they were not unreasonable: evidence 
from the exemplary programs had to show the following 
outcomes: 

Kvidence of bilingual program impact should be 
based .on objective measurements obtained frorri size-, 
able liupil samples. Achievement gain measures should 
be estimat(Ki for program participants and for a com- 
parable control group. Well-designed contrasts with 
pre-program baseline or comparison with appropriate 
norm reference groups are also acceptable. U is neces- 
sary that gains for program participants be significantly 
'greater than gains for the control or comparison group. 

Interpretation of the significance of the reported 
gains depends on customary psychometric and statis- 
tical grounds! Measurements should be reliable and 
valid. Tests should be of appropriate difficulty level for 
the groups examined. The reporting of achievement in 
either grade-equivalent or raw-score scales is accept- 
able; one scale is essentially a linear transformation of 
the other except at extreme ranges. Average gains for 
pupils in the comparison groups should be unbiased 
estimates of the gains for the total population of partici- 
pants; that is. missing data or the effects of selection 
should not be great enough to cast doubt on the find- 
ings. Confidence in the generalizability and potential 
for replicability arc also greater when results arc re- 
ported for several cla.sses and grade levels, so that 
unique teacher or administrator effects can be ruleil out. 

Statistical significance should be demonstrated so 
thai one may confidently conclude that the results 
showing superior program effect did not . occur by 
chance; that is, results showing significant program 
effect, when in fact there is none, should occur no more 
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than five percent of the time. In addition, mean gain 
(lifferenci»s between program and control groups must 
be edurationally relevant whether re[)orted as grade 
e(iuivalents or as relative within-group standard clevia- 
ti(»ri units, h'or example, mean differences of the order 
one-half graile eipiivalent, or one-half standard 
.„ v'iat ion. are meaningful, as is a large positive shift 'u\ 
mean percentiles between pre^ and posttests when pro- 
gram outcomes are compared to those of norm reference 
groups. (Pp. 8-9] 

While these criteria are acceptable for judging the impa(t 
of exemplary f)rogrums. one could argue that their rigor is 
not necessary for nniiiue local efforts. Ilowi?ver. the frus- ^ 
tralion of federal authorities with the seeming lack of 
demonstrably successful efforts has culminated in the 
issuance of new regulations regarding the conduct aud 
evaluation of programs carried out under the aegis of Title 
VII. 'Phe regulations issued in April 197(3 state in part: 

(iiil A description of the evaluation design of the pro- 
posed program. Such evaluation design shall include 
pro vi.s ions for assessing the applicant's progress in 
achieving the objectives set out in its application for 
assistance. In the ca.se of an application to carry out the 
activities described in Sl^ii.Ti la) . the evahiation design 
shall also include the following: 

{A) Provisions for comparing the performance of par- 
ticipating children on tests of reading skills in English 
and in the language other than English to be used in the 
proposed program with an estimate of what such chil- 
tiren's performance would have been in the absence of 
the program. Where the applicant choo.sos to base such 
estimate on the performance of nonparticipating but 
similar children on such tests, the evaluation design 
shall include a description of the methods used tci iden- 
tify nonparticipating but similar children for such 
purpose; 

[U) A description of instruments of measurement to 
he used by the applicant in evaluating the performance 
of participants in the program, the rationale for .select- 
ing these instruments, and procedures to he followed in 
their use; and 

{V) Provisions for reporting pre-test and post-test 
results on reading tests for all participating children 
(and. where their performance is compared with the 
performance of nonparticipating but similarchildren for 
all such nonparticipating children) using mean scores, 
standard deviations and appropriate tests of statistical 
significance. No application which fails to include the 
elements of an evaluation design described in this para- 
graph will 1ie approved for assistance under this sub- 
jMii, IP. M9901 

Clearly, the direction from the federal authorities is 
toward more rigor and toward an evaluation -research 
approjWi. 

C*(my)()unding the programmatic difficulties, both state 
and iw:al. is the recent Lau u. Nichols decision by tho 



Suprcnif Court, whifh, in fsst'nce, states that a conven- 
tional KSL approach is no Icinger sufficient, but that the 
instruction must, when* necessary, he conducted in the 
language of the student The Lau remedies require, as a 
minimum, that: 

1) Schools systcnnit.ically and validly ascertain which of 
their students are linKuistieally different; 



2) Sch(iols sy*-* * Mai ically and validly ascertain the 
language characteristics of their students; 

i^) Schools systematically ascertain the aehievernent char- 
acteristic's of their students; and 

•U Schools match an instructional program to the char- 
acU'ristics as ascertained. 

These remedies, taken together, providt? the general 
framework on which to build an acceptable evaluation plan. 



THE BILINGUAL INSTKHCTIONAL PROGRAM 



The first re(iuisite of an evaluation plan is an explication of 
what is to be evaluated. Typically, the bilingual instruc- 
tional program has been poorly d(»fint»d. To assist the 
evaluator, Martinez and Housden (()} have conceptualized 
a multidimensional framework. The dimensions are: 



Program 
Pnytotypc 



Student Instruct io n a I 

La rifru age Fa c U i ty A p proach 



Transitional Non-Knglish Translation 

Spt'aking 

Monoliterale Limited Knglish IVeview- Review 

Speaking 



Partial 
fiiiinguai 

Full 

Bilingual 



f''iuent l^nglish Concurrent 

Spt-aking 

(Bilingual) 

fiack-to-Baek 

I 

Language-Other*j 
Than-Kngiish 
Immersion ' 



Kelectic 

The program prototypes can be considered as a con- 
tinuum from emphasis on instruction in the dominant 
language of the surrounding culture to equal compe- 
tence in both languages of the student. The typ(?s pf 
programs are described as follows: 

TransitionaL The native language is used in the early 
grades Only to facilitate the mastery of the subject niat- 
ter, so that the child may eventually be phased into a 
curriculum totally reflecting the second Innguage. 

MonoUteratv, These are progranis that address aural- 
oral competency in the native language, but focus on 
the attainment of literacy only i^ the second language. 

Partial Bilingual. The goal is to attain aural- oral 
competency and literacy in both languages, but restrict 
literacy to subject matter relevant to cultural heritage, 
i.e.. social science, literature, and art. 

Full Bilinguai The goal is to attain auraboral compe- 
tency and literacy in both languages in all content areas 
{including mathematics and science). 



The lang-uage facility of bilingual students is neces- 
sarily an essential point of focus. Three language facility 
categ()ries are sufficient, given the current state of the 
art <d' assessing language facility of bilingual students. 
They are; 

Nan-English Speaking. A student who is incapable of 
appropriately reacting to statements or directions given 
b\' a teacher in the L'nglish language because of the in- 
ability to decode verbal English language messages and 
because of the inability to cognitively relate an idea in a 
language other than his/her primar>' language is con- 
sidered to be a non -Knglish speaking student. 

IJmitvd- English Speaking. A student who has not 
<leveloped Knglish language skills of comprehension, 
speaking' reading, and writing sufficiently to benefit 
from instruction only in English and who comes from a 
borne where a language other than English is spoken is 
considered to be a limited- English speaking student. 

Fluont 'English Speaking {Bilingual). A student who 
can learn ecjually well through use of the lOnglish lan- 
guage as through . . . his primary language is considered 
to b(* a fluent-F^nglish speaking student. 

1'he si.\ instructional approaches are: 

Translation. Lessons are presented in English then 
translated to a second lanj^age. These may he done 
simultaneously, at a later time during the day, or even 
on another day. 

Preview-Review. Students receive instruction in two 
languages in any specific lesson or subject area. A pre- 
view is presented in one language, followed by a lesson 
in the second language. Finally a review may he done 
either in both languages, or only in the language of the 
preview. Usually two language- model instructors are 
usei' ''^ f ' format. Students in one group may be pre- 
n languages on the content or context of a 
i > be conducted in either language. They 

mil. )e grouped according to primary language, 

and I IK ])resentation is reviewed in the primary lan- 
guage by the appropriate adult model. 

Concurrent. Both languages are used simultancH^usly 
in the instruction of any specific lesson. The objective 
i,s to teaeh concepts in both languages, avoiding trans- 
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lation. Lanf^ua^o.M are Usutl intorchangeal)ly. Usually 
only one biliriiLrual- model iiistnicLor is utilized. 

Back' to- Back. A designated portion of time, such aw 
in the morning, i.s setiiside for instruction in one lan- 
guage and another portion of the day is devoted In in- 
struction of the sam0/Ourrieirlnni content in che ()th«»r 
language, The student receives instruction in two or 
more languages, hut at tlifferent tinies during a day, 

Lariffuaf^vOth {t- Th an - En^lii;h J m rn ersion . A la n - 
guagc other than Knglish is used for instruction in aca- 



drnuc areas with concentrated hlngli.sh-as-a-Second- 
Lan^riiage <level()pn)ent coni{)()nent., 

EcU'ctic. An eclectic a {)p roach combines one or more 
of the translation, preview-review, concurrent, back'to- 
back, and LOTK immersion mstruetional approaches 
ahuig with other variations such as the outmoded 
Knglish language immersion or often -used saturation 
a{)[)roach with an I'vSi* subcomponent . In practice, the 
eclectic approach may be difficult to observe because its 
definition i.s somewhat ambiguous. (Pp. 5 ff] 



THK EVALUATION FROCKSS 



There are three steps the evaluator should take following 
the explication of the nature of the [irograni: 1) select an 
evaluation model, 2) establish an evaluation design, and 
3) select the a[)propiiatc instruments. The first, a model, 
will provide a franu»work which will assist the evaluator by 
providing a coherent courjse of action. 

Selecting an Kviiluation Model 

A useful summary of recently proposed evaluation models 
can be found in VVorthen and Sanders (12>, Kach has its 
proponents. Many evaluators will select from this assort- 
ment of n'iodels those parts that seeni most appropriate to 
their particular situations, and then construct their own. 
For example. Stufflebeam's ( 10| CIPP model will direct 
the evaluator to consider the context . injiut. process, and 
product evaluations. One ean turn to Tyler (1 U to assure 
that a focus is given to determine objective attainment-. 
The evaluator should be generally faniiliar with other 
models, particularly those of Sc riven (8) and Stake (9), 
Scriven for his forn'iative-summative distinction and S^ake 
for thVj describing and judging focus. Again, these models 
will bt' useful in providing a framework but should not be 
adopted intact in lieu of creating the unique plan neces- 
sitated'by the nature of a bilingual evaluation. 

The models are general statements designed to guide 
the evaluation process; to help the evaluator systematically 
plan, identify critical (Questions to be answered, and gather 
and analy/.e the data to answer these questions. 

It would be instructive to take one of the models, the 
CTPP model by Stufflebeam. and discuss its application. 
This IS the most coniprehtJusive niodel, considering four 
evaluation types or approaches which, when taken to- 
gether, form a single model. 

The Context evaluation (the C of CIPP) should be con- 
sidered, although it is often overlooked in the total evalua- 
tion scheme. Among its many characteristics noted by 
iStufflebeam 110) are descriptions nnd analyses of the 
system to be evaluated, descriptions of goaLs and objec- 
tives, and a focus on the factors known to be important for 
achieving these goals. Stufflebeam further states: 

Context evaluation provides a basis for stating 



change objectives through diagnosing and ranking 
problems in meeting needs or using o[)port unities, and 
it analyzes change objectives to detennine the amount 
of ehange to be effected antl the amount of infornuition 
grasf) available for support. Thereby, it provides an 
initial basis for defining objectives operationally, iden- 
tifying [lotential methodological strategies, and devel- 
opirig t)r(.posals hir outside funding. [P, 219] 

Polkiwing the Contest evaluation, the evaluator is 
directed to the I in CTPP — the Input evaluation. I,ike the 
Context, it has several facets, (ienerally it ean be sum- 
marized as 

...identifying and assessing U relevant capabilities 
of the responsible agency. 2) strategies for achieving 
[urogram goals, and '^) designs for implementing a 
selected strategy. This information is essential for 
structuring specific designs to acconi[)lish program 
objectives. (Pp. 221^-3] 

The Context and Input evaluations are in a sense 
ideoli/ed. Frequently the evaluator comes to a situation 
where the objectives have been stated, the instructional 
framewcirk set. and resources committed. The previously 
discussed instructional framework should be sufficiently 
com[)rehtuisive that the evaluator can infer from it the in- 
structional strategies. These strategies will determine the 
specific designs to be used to meet the program objectives. 

The key in this process is to determine and get agree- 
ment on the objectives, the "what" that is to be assessed 
and judged. The Context and Input evaluation yields uni- 
form at ion that sharpens the focus of the process and 
produet evaluation phases. It provides a background that 
goes beytmd the instructional objectives to help the evalu- 
ato*- define other areas of interest — costs, benefits, atti- 
tudes of participants, involvement of parents, and what- 
ever other important objectives have been identified. 

The Process evaluation is perhaps more familiar to the 
reader. Process evaluation pros'ides the feedback of in- 
formation about the program as it evolves. Stufflebeam 
posits three main objectives for Process evaluation: 

Process evaluation, has three main objectives — the 
first is to detect or predict defects in the procedural. 
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dcs'iKn or its implementation iluririK the* implementation 
stages, the i^wond is to provide information for pro- 
gramnied decisions, and the third is to maintain a 
record of the procedure as it occurs. (P. 229] 

The FVoeess evaJuation strategy is one of flexihility^- not 
ci fixed or rigid process. It is more a sensing process than 
strictly formal measurement. Ideally the evaluator is inde- 
pendent of the program or project staff, yet is in constant 
communication with them. The instalments- used can be 
less formal than objective tests (though objective test.s 
have an important role if used judicioMsly), typically con- 
sisting of interviews, cjuestionnaires, school records, 
parent contacts and reactions, relationships with com- 
munity agencies, records c/f utilization of instructional 
materials, and so on. Keporting may be formal or infornial 
hut it must he continuous. The most critical phase is early 
in the implementation of the program. 

Product evaluation is thf» process that is most familiar 
to the reiTtJ^r. It is. however, broader than typically con- 
ceived. It occurs not only as the terminal assessment and 
analysis process hut also during the implementation of the 
program, as Certain previously determined check points 
are reached. Scriven (H) niakes the distinction between 
instrumental (accomplishments at an intermediate level) 
and consequential (the terminal assessment of fundamental 
attainments). 

Reviewing some of the important trtmcepts in bilingual 
education and placing them in the CW'V framework will 
give the reader a .sense of direction. The folkiwing tjues- 
tions are appropriate for each type of evaluation: 

Context: What are the values and goals held by the sys- 
tem as related to bilingual im 'uct'.on? What are the de- 
sired and actual conditions in the environment: for ex- 
ample, how many students need a bilingual program? Of 
what type? What information is needed or exists about the 
system, and state and federal regulations and guidelines? 
What is to be the role ui the evaluator? 

Input: What existing capabilities does the system have; 
for example, faculty capability in a second language? 
What instructional framework is appropriate? What 
overall design for instruction and evaluation is desired and 
feasible? 

Process: Is the program proceedijig as scheduled? What 
problems exist in implementation? Are instructional mate- 
rials adequate and in place? What are the attitudes of key 
people — parents, students, teachers, and administrators? 
Are initial instructional units effective? (Note: This may 
also be a Product evaluation.) 

Product: What arc the attainments of the pupils? Have 
the objectives been met? What ciecisions can be made? 

Establishing the Evaluation Design 

The mndcls give general direction: the design yields a spe- 
cific plan for data coHeCLlon and analysis. It would not be 
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appropriate to call this process an experinjental design; 
that label would be accurate only in rare situations. True 
exfierimental designs call for the establishment of treat- 
ment and control groups^ random assignment, and the 
like. Such luxuries rarely occur in typical public or -al 
pv('grams. Should the exceptional case arise, a variety of 
classical experimental desigJis is availabk?. 

I believe the very nature of bilingual-biealt ural educa- 
tion often mitigates against a rigorous experimental- 
control, random assignment design. The population is 
unitjue, culturally and linguistically, preventing a readily) 
available comparison group. The instructional program is 
typically locisely defined, and the measurement problems 
are acute. The evaluator must be ingenious in constructing 
a design that is both feasible and capable of yielding evi- 
dence of the program's impact or lack of it. 

Uealistically . the best approximation available is the 
series of designs proposed by Campbell and Stanley (2). 
The most conwon of these is a pre- post assessment with 
some type of comparison group. The comparison group 
could be a group of students who are not in the jirogram 
but who have characteristics similar to those of the 
participants. 

The interrupted time series design is another possibility. 
Most sinijily . this design calls for a series of measurements 
of student perforniance before a treatment, interv'cntion of 
the treatment, and then measurements of performa.ice 
after the treatment. Schematically it looks like this: 

(.MIeasurement — M ~- M — (T)reatment — M — M — M 

Historical measures could be derived from existing school 
records. 

.An alternative is to use the participants as their own 
c{introl, by establishing an expected level of attainment 
and contrasting the actual, or obtained, l^vc with this ex- 
pected level. Such a process gives an t nation of the 
attainmei^' of objectives but provides only a minimum of 
statistical evidence. 

Pop ham (7) provides an excellent di.scussion of the 
Campliell and Stanley quasi-experimental designs. 

Selecting the Appropriate Instruments 

The nK)st difficult aspect of any bilingual evaluation is 
assessing pupil progress in the instructional components 
of reading or language acquisition. P^xternal funding 
sources retjuire some (Quantitative estimate of pupil growth 
both in I'mglish and the native or home language. There 
are few. if any. apprc)priate norm-referenced tests in lan- 
guages other than Pmglish. Limited English-speaking stu- 
dents, in some cases, may be assessed using available 
norm-referenced longlish-language tests. The lack c*^ appro- 
priate norm-referenced instruments clearly suggests that 
any tjuantitative assessment must rely heavily on data 
collected from locally constructed or criterion -referenced 
instruments or items. 

The use of such instruments, without extensive statiis- 
tical treatment, precludes normative comparisons. Al- 
though the absence of normative comparison does not 




iavply the uhscncc of appropriate moasurunu'nL, and Lhti 
information fjjathorcd through these instruments may well 
satisfy Ix^al needs for evidence of program success, the 
pvaluator may bo hard-pressed Lo satisfy state ar.d federal 
re(^i)irein«Mits. 

Certain decisions must now he made ab{)ut measure- 
ment concerns, fiegardless ot t,ie program prototype and 
evaluation model, certain common program-component 
assessment problems will exist. The various noninstruc- 
tional components can best be evaluated directly by estab- 
lishing measurable objectives for each and then obtaining 
assessment of these objectives. For example, if one of the 
objectives for the parent involvement component is 
"parents will be made aware of the nature and the process 
of the instructional program for the students," parental 
understanding ran he assessed directly by eliciting indica- 
tions of their understanding. Parents' attendance at meet- 
ings is not synonymous with understanding. 

The same logic holds for staff development, auxilia**v 
services, and the like. It is important to determine which 
assessments will he formative and which summative. 
Timelines for the accomplishment of certain evaluations 
will help in feeding back im[)ortant information about the 
program as it unfolds and will assist in any corrections 
that should be mad*'. 

In addition, definition of which processes need baseline 
data must be determined during the program-planning 
phase. ! nstrunu'ntation for these noninstructional com 
p{)nents can consist of questionnaires, checklists, struc- 
tured interviews, observation schedules and. on occasion. 



locally developed tests. Guides for the development of 
such instrunients can be found in Berdie and Anderson (1). 
The major problem is the determination ol" pupil progress 
with an assessment program that is psychometrically 
rational and permits sound interpretation. The evaluator 
nnist he skillful in constructing appropriate instruments 
to meiisure both formative and summative progress. 

Kven if the pupils have a command of Knglish sufficient 
to ]iermit the use of a standardized test (or groups of items 
from such a rest), comparison of such a group of students 
with a norm group of fluent Knglish-speaking students on 
s()ir.'.' direct basis is not always meaningful. The scores do, 
however, provide an indox of movement toward a norma- 
tive reference. Such movemcmt can also be observed in the 
change' of P (percent of correct responses of a re^«'rence 
group) vuhies of items or item clusters judged by the In- 
structional staff to be relevant to the objectives. Certain 
statistical tests such as pre-, post-, means, variances, and 
signn'icance of differences can be computed from this type 
of data. 

K valuation of bilingual programs is difficult at best: the 
instructional programs are usually puorly defined, there is ^ 
virtually a total void of appropriate instruments, and the . 
exi.-cing evaluation models and designs can provide only 
general guidance. There are movenionts toward meeting 
the need for appropriate evaluation tools, hut the develop- 
ment of such instruments is some time away. Until this 
major t)roblem is solved, the evaluator must rely on his 
ingenuity to provide useful assessment devices and evalu- 
ative information. 

u 
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